Changed SAMRecord.toString() to emit the SAM format string with all fields#1762
Merged
Conversation
nh13
reviewed
Apr 4, 2026
Member
nh13
left a comment
There was a problem hiding this comment.
- can you verify there's no newline, and if there is, strip it (he says knowingly)
- are there tools (picard, igv, gatk, fgbio, others) that rely on the
toStringreturned format (I know I know), but it is a public so a breaking change... - Should you add at least one regression test?
toStringwas previously lock free, but looking at getSAMString we have somesynchronizedmethods...- the previous method was lighter-weight, is this a concern?
- are there tests that (shudders) compared outputs using
toString? - is it time to remove the
formatdeprecated method?
The text formatting path used by SAMRecord.toString() / getSAMString() has been cleaned up: - SAMTextWriter.getSAMString no longer uses a JVM-wide synchronized static cache of a shared StringWriter/SAMTextWriter. It now allocates a fresh StringWriter per call. Without this change, every toString() call would have taken a global monitor. - SAMTextWriter.writeAlignment is split into a private writeAlignmentNoNewline plus a thin wrapper that appends '\n'. getSAMString uses the no-newline variant, so the result no longer needs to be trimmed. BREAKING CHANGE: SAMRecord.getSAMString() no longer terminates the returned String with a newline character. This brings SAMRecord into line with the other getSAMString() implementations (SAMSequenceRecord, SAMReadGroupRecord, SAMProgramRecord, SAMFileHeader), which already return newline-free strings. Callers that relied on the trailing '\n' as a separator (e.g. concatenating two records' SAM strings) must now insert their own separator. Callers that stripped the newline with .trim() can drop the call.
SAMRecord.format() was deprecated in favor of getSAMString() because it didn't guarantee a valid SAM text representation. Now that getSAMString() is the canonical formatter (and is used by toString()), drop format() and the private helpers that supported it: addField, formatTagValue, safeEquals. Also remove the SRALazyRecord.format() override, which existed only because format() bypassed lazy attribute loading; getSAMString() goes through SRALazyRecord.getBinaryAttributes() which already handles lazy loading. And delete a couple of stale commented-out debug prints that referenced format(). BREAKING CHANGE: SAMRecord.format() has been removed. Callers should use SAMRecord.getSAMString() instead.
8d787ad to
6ba9463
Compare
Member
Author
|
Thanks for the review @nh13. I have done several things based upon it:
|
tfenne
added a commit
that referenced
this pull request
Apr 25, 2026
…ields (#1762) * Changed SAMRecord.toString() to emit the SAM format string with all fields. * Drop sync + trailing newline from SAMRecord.getSAMString. * Remove deprecated SAMRecord.format() and now-dead helpers. The text formatting path used by SAMRecord.toString() / getSAMString() has been cleaned up: - SAMTextWriter.getSAMString no longer uses a JVM-wide synchronized static cache of a shared StringWriter/SAMTextWriter. It now allocates a fresh StringWriter per call. Without this change, every toString() call would have taken a global monitor. - SAMTextWriter.writeAlignment is split into a private writeAlignmentNoNewline plus a thin wrapper that appends '\n'. getSAMString uses the no-newline variant, so the result no longer needs to be trimmed. BREAKING CHANGE: SAMRecord.getSAMString() no longer terminates the returned String with a newline character. This brings SAMRecord into line with the other getSAMString() implementations (SAMSequenceRecord, SAMReadGroupRecord, SAMProgramRecord, SAMFileHeader), which already return newline-free strings. Callers that relied on the trailing '\n' as a separator (e.g. concatenating two records' SAM strings) must now insert their own separator. Callers that stripped the newline with .trim() can drop the call. BREAKING CHANGE: SAMRecord.format() has been removed. Callers should use SAMRecord.getSAMString() instead.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
SAMRecord has historically output an abbreviated format in toString(). I think more often than not this is confusing to users, since a SAMRecord has a well defined String format. Moreover the toString() method is used extensively by TestNG to display when two SAMRecord instances are not equal - and the vast majority of the time the toString() output did not contain the differences.
Things to think about before submitting: